Virtual Cluster For HPC Education
Virtual Cluster For HPC Education
Virtual Cluster For HPC Education
Abstract
For many institutions, it is challenging to procure and maintain re-
sources to teach parallel and distributed computing. While existing ded-
icated environments such as XSEDE are available, they often have high
level of utilization, leading to difficulty in supporting real-time hands-on
in-class sessions, especially for larger classes. This work describes the de-
sign and development of a Linux-based distributed cyberinfrastructure
(CI) to address this problem. Unlike typical production-level environ-
ment, the CI is designed to be dynamically customized and deployed
on a federal cloud resource. Besides computing, the CI provides a job
scheduler, message passing, networked storage, and single sign-on mech-
anisms. Configurations of these components can be adjusted prior to the
automatic installation process during deployment. Scalability is demon-
strated, both as the number of cores and shared storage nodes increases,
showing that the proposed cluster emulates a large-scale system.
1 Introduction
For the majority of teaching-focus higher education institutions, institutional
missions often emphasizes community services, liberal arts, teaching quality,
accessibility, and commitment to diversity [8]. These institutions usually dis-
tribute financial and human resources to meet their primary teaching and stu-
copy without fee all or part of this material is granted provided that the copies are not made
or distributed for direct commercial advantage, the CCSC copyright notice and the title of
the publication and its date appear, and notice is given that copying is by permission of the
Consortium for Computing Sciences in Colleges. To copy otherwise, or to republish, requires
a fee and/or specific permission.
1
dent service goals across all disciplines. This makes it difficult to support large-
scale cyberinfrastructure resources. While there is no previous study regarding
the availability of computing resources at smaller institutions, anecdotal ev-
idence points toward a clear lack of local resources for educational purposes
[4]. Even though there are large-scale national infrastructures such as XSEDE,
existing utilization from non-research institutions on these resources is low and
grows at a rate much smaller than that of research institutions. Furthermore,
high utilization rate from prioritized activities leads to increased wait time,
particularly during the day. This prevents effective implementation of in-class
hands-on learning activities.
Efforts have been made to develop affordable computing clusters that can
be used to teach basic PDC concepts. One approach is to boot a temporary
networked computer lab into a pre-configured distributed computing environ-
ment [3]. Combined with a small-scale multi-processor buildout hardware kit,
we have the ability to create inexpensive and portable mini clusters for edu-
cation [2]. Alternatively, advances in virtualization technologies have led to
solutions that support the creation of virtual computing clusters within exist-
ing computer laboratories. These clusters can either scale across all resources
[10] or are comprised of many mini clusters for learning at individual levels [7].
Without leveraging existing on-premise resources, reduction in hardware costs
leads to approaches that lean toward the development of personal computing
clusters. The costs can range from around $3,000 [1] to $200 [11]. In both
scenarios, they also require additional administrative effort, which could either
be facilitated by student teams, supported by institutional staff or require time
effort from the instructors. This presents challenges to institutions with lim-
ited resources, teaching responsibilities are high, and typical students are not
prepared to take up advanced Linux system administration tasks.
We present an approach that leverages cloud computing to provision a
cyberinfrastructure on which a full-fledged virtual supercomputer can be de-
signed and deployed. While conceptually similar to [7], our work does not rely
on premade VM component images. Instead, we utilize an academic cloud
[9] to create blueprints that correspond to components of a supercomputer in-
frastructure. At the cost of longer startup time, this allows us to provide a
high level of customization to clusters deployed through the platform. The
individual tasks in our CI blueprint are carried out as direct automated Linux
instructions which helps providing more insights into how the system works.
In additional to larger deployment supporting entire classes, the blueprint also
allows smaller cluster (taking only a portion of a physical node) to be deployed
should students wish to study on their own.
The remainder of this paper is organized as follows. Section 2 describes the
design and deployment environments of the proposed cloud-based CI. Section 3
2
presents and summarizes various administrative and performance scaling tests
regarding operations of the cloud-based CIs. Section 4 concludes the paper
and discusses future work.
2.1 CloudLab
Funded by the National Science Foundation in 2014, CloudLab has been built
to provide researchers with a robust cloud-based environment for next gener-
ation computing research [9]. As of Fall 2019, CloudLab boasts an impressive
collection of hardware. At the Utah site, there are a total of 785 nodes, includ-
ing 315 with ARMv8, 270 with Intel Xeon-D, and 200 with Intel Broadwell.
The compute nodes at Wisconsin include 270 Intel Haswell nodes with memory
ranging between 120GB and 160GB and 260 Intel Skylake nodes with mem-
ory ranging between 128GB and 192GB. At Clemson University, there are 100
nodes running Intel Ivy Bridges, 88 nodes running Intel Haswell, and 72 nodes
running Intel Skylake. All of Clemson’s compute nodes have large memory (be-
tween 256GB and 384GB), and there are also two additional storage-intensive
nodes that have a total of 270TB of storage available.
In order to provision resources using CloudLab, a researcher needs to de-
scribe the necessary computers, network topologies, and startup commands in
a resource description document. CloudLab provides a browser-based graphical
interface that allows users to visually design this document through drag-and-
drop actions. For large and complex profiles, this document can be automat-
ically generated via Python in a programmatic manner. Listing 1 describes a
Python script that will generate a resource description document that requests
six virtual machines, each of which has two cores, 4GB of RAM, and runs
CentOS 7. Their IP addresses ranges from 192.168.1.1 through 192.168.1.6.
pc = portal.Context()
request = pc.makeRequestRSpec()
link = request.LAN("lan")
for i in range(6):
if i == 0:
node = request.XenVM("head")
3
node.routable_control_ip = "true"
elif i == 1:
node = request.XenVM("metadata")
elif i == 2:
node = request.XenVM("storage")
else:
node = request.XenVM("compute-" + str(i))
node.cores = 2
node.ram = 4096
node.disk_image = "urn:publicid:IDN+emulab.net+image+emulab-ops:CENTOS7-64-STD"
iface = node.addInterface("if" + str(i-3))
iface.component_id = "eth1"
iface.addAddress(pg.IPv4Address("192.168.1." + str(i + 1), "255.255.255.0"))
link.addInterface(iface)
pc.printRequestRSpec(request)
4
Figure 1: Services provided by each server component
LDAP and Single-Sign-On: Our CI forwards the port of the SSH server
on the head node to allow users to sign in remotely. LDAP provides user ac-
counts with uniform UIDs and GIDs across all nodes in the CI. All nodes in
the CI authenticate accounts against LDAP, enabling a streamlined environ-
ment for all tasks, including passwordless SSH connections between nodes in
the server and shared network storage. The automatic deployment of LDAP is
facilitated through the use of Debian preseeding and configuration scripts. A
pre-configured list of users and groups is included to allow repeated deployment
and can be modified. LDAP is the first component to be deployed before any
other service.
Filesystems: By default, each instance on CloudLab comes with a total
storage space of 16GB. It is possible to attach storage blocks to each instance
to serve as expanded local scratch for the compute nodes. Two remote net-
work storage infrastructures are included with the CI blueprint. The first is
a network file system (NFS) that is setup on its own node. The NFS filesys-
tem provides four directories to be mounted across the compute and head
nodes: /home: provides shared user home directories, /software: provides
shared client sofware, /opt: provides shared system software, and /mpishare:
contains MPI sample codes for students. Housing home directories on the NFS
server provides uniform access to user files across the entire CI and allows the
user to easily run MPI scripts on all compute nodes. It also makes password-
less sign-on between nodes simpler, as each user will have a one SSH key in
the shared home directory. The second remote network storage infrastructure
is BeeGFS, a parallel file system. BeeGFS consists of three primary services:
5
management, metadata, and storage. In the current default blueprint, we pro-
vision one node for the management service, one node for the metadata service,
and two nodes for storage. Before deploying the CI, it is possible to customize
the blueprint to include more storage servers (improving parallel I/O). It is
also possible to merge all three services onto one or two nodes, at the cost of
performance. The BeeGFS service configuration is stored in a simple JSON
file which loads from the GitHub repository, allowing the user to have the same
PFS architecture each time the CI is deployed but also allowing the architec-
ture to be quickly updated before deployment. Once deployed, the BeeGFS
storage is mounted as /scratch. Under this directory, each individual user ac-
count in LDAP automatically has a subdirectory, named after the user’s login
name. These user scratch directories are available across head and all compute
nodes.
Scheduler: We include SLURM as our default scheduler in the blueprint,
as it is the most popular scheduler across XSEDE sites. The NFS filesystem en-
ables distribution of configuration files for various SLURM components across
all nodes, including those of MariaDB, the back end database for SLURM. To
automate the configuration process, we created boilerplate configuration files
containing dummy information and updated these at deployment time.
Application (OpenMPI): Once SLURM has been installed and config-
ured, OpenMPI can be deployed. Because the virtual supercomputer is de-
ployed in a cloud environment and is also attached to the Internet for user
access via SSH, there are several network interfaces on each node. For Open-
MPI to run properly, it must be configured at install time to use only the
internal IP network. This is achieved through MPI’s configuration files, which
are loaded with the names of network interfaces provisioned at deployment.
Compute Nodes: These run the clients for LDAP, NFS, BeeGFS, SLURM,
and OpenMPI, allowing them to obtain single sign-on, share directories, mount
scratch space, and to participate in the parallel computing process. The /home,
/opt, /software, and /mpishare directories are mounted on each compute node
on the NFS server, and all compute nodes are provided scratch space on a
BeeGFS parallel file system.
2.2.1 Deployment
The hardware for our CI is requested from CloudLab using CloudLab’s Python
scripting support. The same Python script is used to launch accompanying
Bash scripts that deploy and configure CI software components. The Python
and Bash scripts are stored in a GitHub repository and updates are automat-
ically pulled by CloudLab, allowing the latest version of the CI to be incorpo-
rated into the corresponding CloudLab profile with each run.
The entire system deploys and configures itself automatically over the course
6
(a) Three physical computing nodes. (b) A single computing node.
of several hours, depending on the number of nodes requested. The user who
deploys the system can place sample scripts, information, or data in the /source
directory on GitHub, and those files will be automatically copied into the
shared scratch directory accessible by all compute nodes. Figure 2a illustrates
a sample deployment of a supercomputer with one head node, one NFS node,
six BeeGFS nodes (one metadata, one management and four storage nodes),
and four compute nodes. The nodes in this deployment are spread across three
physical systems hosted at CloudLab’s Clemson site. A slightly smaller (in
term of node count) deployment hosted at CloudLab’s Wisconsin site is shown
in Figure 2b. This deployment only has four BeeGFS nodes (one metadata,
one management and two storage nodes) instead of six.
7
Figure 3: Standard interaction with the scheduler for job submissions
8
to 8 processes running on the same virtual computing node, writing activities
are limited by a single virtual network connection and therefore unable to take
advantage of the two BeeGFS storage servers. With more than 8 processes,
speedups are observed, but capped at 2 due to similar logic from the storage
servers’ perspective.
References
[1] Joel C Adams and Tim H Brom. Microwulf: a beowulf cluster for every
desk. ACM SIGCSE Bulletin, 40(1):121–125, 2008.
[2] Ivan Babic, Aaron Weeden, Mobeen Ludin, Skylar Thompson, Charles
Peck, Kristin Muterspaw, Andrew Fitz Gibbon, Jennifer Houchins, and
Tom Murphy. Littlefe and bccd as a successful on-ramp to hpc. In Proceed-
ings of the 2014 Annual Conference on Extreme Science and Engineering
Discovery Environment, page 73. ACM, 2014.
[3] Sarah M Diesburg, Paul A Gray, and David Joiner. High performance
computing environments without the fuss: the bootable cluster cd. In
19th IEEE International Parallel and Distributed Processing Symposium,
pages 8–pp. IEEE, 2005.
9
[4] Jeremy Fischer, Steven Tuecke, Ian Foster, and Craig A Stewart. Jet-
stream: a distributed cloud infrastructure for underresourced higher edu-
cation communities. In Proceedings of the 1st Workshop on The Science
of Cyberinfrastructure: Research, Experience, Applications and Models,
pages 53–61. ACM, 2015.
[9] Robert Ricci, Eric Eide, and CloudLab Team. Introducing cloudlab: Sci-
entific infrastructure for advancing cloud architectures and applications. ;
login:: the magazine of USENIX & SAGE, 39(6):36–38, 2014.
[10] Elizabeth Shoop, Richard Brown, Eric Biggers, Malcolm Kane, Devry Lin,
and Maura Warner. Virtual clusters for parallel and distributed education.
In Proceedings of the 43rd ACM technical symposium on Computer Science
Education, pages 517–522. ACM, 2012.
[11] David Toth. A portable cluster for each student. In 2014 IEEE Inter-
national Parallel & Distributed Processing Symposium Workshops, pages
1130–1134. IEEE, 2014.
10