Cluster Computing
Cluster Computing
Cluster Computing
respects they form a single computer. The components of a cluster are commonly, but not
always, connected to each other through fast local area networks. Clusters are usually deployed
to improve performance and/or availability over that of a single computer, while typically being
much more cost-effective than single computers of comparable speed or availability.
The High Performance Computing (HPC) allows scientists and engineers to deal with very
complex problems using fast computer hardware and specialized software. Since often these
problems require hundreds or even thousands of processor hours to complete, an approach, based
on the use of supercomputers, has been traditionally adopted. Recent tremendous increase in a
speed of PC-type computers opens relatively cheap and scalable solution for HPC, using cluster
technologies. The conventional MPP (Massively Parallel Processing) supercomputers are
oriented on the very high-end of performance. As a result, they are relatively expensive and
require special and also expensive maintenance support. Better understanding of applications and
algorithms as well as a significant improvement in the communication network technologies and
processors speed led to emerging of new class of systems, called clustersof SMP(symmetric
multi processor) or networks of workstations(NOW), which are able to compete in performance
with MPPs and have excellent price/performance ratios for special applications types.
?
A clusteris a group of independent computers working together as a single system to ensure that
mission-critical applications and resources are as highly available as possible. The group is
managed as a single system, shares a common namespace, and is specifically designed to tolerate
component failures, and to support the addition or removal of components in a way that's
transparent to users.
Development of new materials and production processes, based on high technologies, requires a
solution of increasingly complex computational problems. However, even as computer power,
data storage, and communication speed continue to improve exponentially; available
computational resources are often failing to keep up with what users? demand of them. Therefore
high-performance computing (HPC) infrastructure becomes a critical resource for research and
development as well as for many business applications. Traditionally the HPC applications were
oriented on the use of high-end computer systems - so-called "supercomputers". Before
considering the amazing progress in this field, some attention should be paid to the classification
of existing computer architectures. SISD (Single Instruction stream, Single Data stream) type
computers. These are the conventional systems that contain one central processing unit (CPU)
and hence can accommodate one instruction stream that is executed serially. Nowadays many
large mainframes may have more than one CPU but each of these executes instruction streams
that are unrelated. Therefore, such systems still should be regarded as a set of SISD machines
acting on different data spaces. Examples of SISD machines are for instance most workstations
like those of DEC, IBM, Hewlett-Packard, and Sun Microsystems as well as most personal
computers. SIMD (Single Instruction stream, Multiple Data stream) type computers. Such
systems often have a large number of processing units that all may execute the same instruction
on different data in lock-step. Thus, a single instruction manipulates many data in parallel.
Examples of SIMD machines are the CPP DAP Gamma II and the Alenia Quadrics.
Vector processors, a subclass of the SIMD systems. Vector processors act on arrays of similar
data rather than on single data items using specially structured CPUs. When data can be
manipulated by these vector units, results can be delivered with a rate of one, two and, in special
cases, of three per clock cycle (a clock cycle being defined as the basic internal unit of time for
the system). So, vector processors execute on their data in an almost parallel way but only when
executing in vector mode. In this case they are several times faster than when executing in
conventional scalar mode. For practical purposes vector processors are therefore mostly regarded
as SIMD machines. Examples of such systems are Cray 1 and Hitachi S3600. MIMD (Multiple
Instruction stream, Multiple Data stream) type computers. These machines execute several
instruction streams in parallel on different data. The difference with the multi processor SISD
machines mentioned above lies in the fact that the instructions and data are related because they
represent different parts of the same task to be executed. So, MIMD systems may run many sub-
tasks in parallel in order to shorten the time-to-solution for the main task to be executed. There is
a large variety of MIMD systems like a four-processor NEC SX-5 and a thousand processor
SGI/Cray T3E supercomputers. Besides above mentioned classification, another important
distinction between classes of computing systems can be done according to the type of memory
access?
Shared memory (SM) systemshave multiple CPUs all of which share the same address space.
This means that the knowledge of where data is stored is of no concern to the user as there is
only one memory accessed by all CPUs on an equal basis. Shared memory systems can be both
SIMD and MIMD. Single-CPU vector processors can be regarded as an example of the former,
while the multi-CPU models of these machines are examples of the latter.
Distributed memory (DM) systems. In this case each CPU has its own associated memory. The
CPUs are connected by some network and may exchange data between their respective
memories when required. In contrast to shared memory machines the user must be aware of the
location of the data in the local memories and will have to move or distribute these data
explicitly when needed. Again, distributed memory systems may be either SIMD or MIMD.
Design:
?
Before attempting to build a cluster of any kind, think about the type of problems you are trying
to solve. Different kinds of applications will actually run at different levels of performance on
different kinds of clusters. Beyond the brute force characteristics of memory speed, I/O
bandwidth, disk seek/latency time and bus speed on the individual nodes of your cluster, the way
you connect your cluster together can have a great impact on its efficiency.
Network Selection.
Speed should be the criterion for selecting the network.? Channel bonding, which is a software
trick that allows multiple network connections to be tied, together to increase overall
performance of the system can be used to increase the performance of Ethernet networks.
Security Considerations
Special considerations are involved when completing the implementation of a cluster. Even with
the queue system and parallel environment, extra services are required for a cluster to function as
a multi-user computational platform. These services include the well known network services
NFS, NIS and rsh. NFS allows cluster nodes to share user home directories as well as installation
files for the queue system and parallel environment. NIS provides correct file and process
ownership across all the cluster nodes from the single source on the master machine. Although
these services are significant components of a cluster, such services create numerous
vulnerabilities. Thus, it would be insecure to have cluster nodes function on an open network.
For these reasons, computational cluster nodes usually reside on private networks, often
accessible for users only through a firewall gateway. In most cases, the firewall is configured on
the master node using ipchains or iptables.
Having all cluster machines on the same private network requires them to be connected to the
same switch (or linked switches) and, therefore, localized at the same proximity. This situation
creates a severe limitation in terms of cluster scalability. It is impossible to combine private
network machines in different geographic locations into one joint cluster, because private
networks are not routable with the standard Internet Protocol (IP).
Combining cluster resources on different locations, so that users from various departments would
be able to take advantage of available computational nodes, however, is possible. Theoretically,
merging clusters is not only desirable but also advantageous, in the sense that different clusters
are not localized at one place but are, rather, centralized. This setup provides higher availability
and efficiency to clusters, and such a proposition is highly attractive. But in order to merge
clusters, all the machines would have to be on a public network instead of a private one, because
every single node on every cluster needs to be directly accessible from the others. If we were to
do this, however, it might create insurmountable problems because of the potential--the
inevitable--security breaches. We can see then that to serve scalability, we severely compromise
security, but where we satisfy security concerns, scalability becomes significantly limited. Faced
with such a problem, how can we make clusters scalable and, at the same time, establish a rock-
solid security on the cluster networks? Enter the Virtual Private Network (VPN).
VPNs often are heralded as one of the most cutting-edge, cost-saving solutions to various
applications, and they are widely deployed in the areas of security, infrastructure expansion and
inter-networking. A VPN adds more dimension to networking and infrastructure because it
enables private networks to be connected in secure and robust ways. Private networks generally
are not accessible from the Internet and are networked only within confined locations.
The technology behind VPNs, however, changes what we have previously known about private
networks. Through effective use of a VPN, we are able to connect previously unrelated private
networks or individual hosts both securely and transparently. Being able to connect private
networks opens a whole slew of new possibilities. With a VPN, we are not limited to resources
in only one location (a single private network). We can finally take advantage of resources and
information from all other private networks connected via VPN gateways, without having to
largely change what we already have in our networks. In many cases, a VPN is an invaluable
solution to integrate and better utilize fragmented resources.
In our environment, the VPN plays a significant role in combining high performance Linux
computational clusters located on separate private networks into one large cluster. The VPN,
with its power to transparently combine two private networks through an existing open network,
enabled us to connect seamlessly two unrelated clusters in different physical locations. The VPN
connection creates a tunnel between gateways that allows hosts on two different subnets (e.g.,
192.168.1.0/24 and 192.168.5.0/24) to see each other as if they are on the same network. Thus,
we were able to operate critical network services such as NFS, NIS, rsh and the queue system
over two different private networks, without compromising security over the open network.
Furthermore, the VPN encrypts all the data being passed through the established tunnel and
makes the network more secure and less prone to malicious exploits.
The VPN solved not only the previously discussed problems with security, but it also opened a
new door for scalability. Since all the cluster nodes can reside in private networks and operate
through the VPN, the entire infrastructure can be better organized and the IP addresses can be
efficiently managed, resulting in a more scalable and much cleaner network. Before VPNs, it
was a pending problem to assign public IP addresses to every single node on the cluster, which
limited the maximum number of nodes that can be added to the cluster. Now, with a VPN, our
cluster can expand in greater magnitude and scale in an organized manner. As can be seen, we
have successfully integrated the VPN technology to our networks and have addressed important
issues of scalability, accessibility and security in cluster computing.